NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Yo’Chameleon: Personalized Vision and Language Generation

Nguyen, Thao; Singh, Krishna Kumar; Shi, Jing; Bui, Trung; Lee, Yong Jae; Li, Yuheng (June 2025, CVPR)

Free, publicly-accessible full text available June 11, 2026
ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning

https://doi.org/10.18653/v1/2023.findings-emnlp.878

Lai, Viet; Ngo, Nghia; Pouran Ben Veyseh, Amir; Man, Hieu; Dernoncourt, Franck; Bui, Trung; Nguyen, Thien Huu (December 2023, Findings of the Association for Computational Linguistics: EMNLP 2023)
Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

Lai, Viet; Salinas, Abel; Tan, Hao; Bui, Trung; Tran, Quan; Yoon, Seunghyun; Deilamsalehy, Hanieh; Dernoncourt, Franck; Nguyen, Thien (August 2023, Proceedings of INTERSPEECH)
Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

https://doi.org/10.21437/Interspeech.2023-1468

Lai, Viet Dac; Salinas, Abel; Tan, Hao; Bui, Trung; Tran, Quan; Yoon, Seunghyun; Deilamsalehy, Hanieh; Dernoncourt, Franck; Nguyen, Thien Huu (August 2023, Proceedings of the INTERSPEECH Conference)
Medical Question Understanding and Answering with Knowledge Grounding and Semantic Self-Supervision

Mrini, Khalil; Singh, Harpreet; Dernoncourt, Franck; Yoon, Seunghyun; Bui, Trung; Chang, Walter; Farcas, Emilia; Nakashole, Ndapa (October 2022, Proceedings of the 29th International Conference on Computational Linguistics)

Full Text Available
Keyphrase Prediction from Video Transcripts: New Dataset and Directions

Pouran Ben Veyseh, Amir; Tran, Quan Hung; Yoon, Seunghyun; Manjunatha, Varun; Deilamsalehy, Hanieh; Jain, Rajiv; Bui, Trung; Chang, Walter W.; Dernoncourt, Franck; Nguyen, Thien Huu (October 2022, Proceedings of the 29th International Conference on Computational Linguistics (COLING))

Full Text Available
Learning by Planning: Language-Guided Global Image Editing

Shi, Jing; Xu, Ning; Xu, Yihang; Bui, Trung; Dernoncourt, Franck; Xu, Chenliang (January 2021, IEEE/CVF Conference on Computer Vision and Pattern Recognition)
null (Ed.)
Full Text Available
PhraseCut: Language-Based Image Segmentation in the Wild

https://doi.org/10.1109/CVPR42600.2020.01023

Wu, Chenyun; Lin, Zhe; Cohen, Scott; Bui, Trung; Maji, Subhransu (June 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

Full Text Available
A Benchmark and Baseline for Language-Driven Image Editing

Shi, Jing; Xu, Ning; Bui, Trung; Dernoncourt, Franck; Wen, Zheng; Xu, Chenliang (January 2020, Asian Conference on Computer Vision)
null (Ed.)
Full Text Available
Visual to Sound: Generating Natural Sound for Videos in the Wild

Zhou, Yipin; Wang, Zhaowen; Fang, Chen; Bui, Trung; Berg, Tamara L. (June 2018, IEEE Conference on Computer Vision and Pattern Recognition)

As two of the five traditional human senses (sight, hearing, taste, smell, and touch), vision and sound are basic sources through which humans understand the world. Often correlated during natural events, these two modalities combine to jointly affect human perception. In this paper, we pose the task of generating sound given visual input. Such capabilities could help enable applications in virtual reality (generating sound for virtual scenes automatically) or provide additional accessibility to images or videos for people with visual impairments. As a first step in this direction, we apply learning-based methods to generate raw waveform samples given input video frames. We evaluate our models on a dataset of videos containing a variety of sounds (such as ambient sounds and sounds from people/animals). Our experiments show that the generated sounds are fairly realistic and have good temporal synchronization with the visual inputs.
more » « less
Full Text Available

Search for: All records